3,153 research outputs found

    On the Provision of a Comprehensive Computer Graphics Education in the Context of Computer Games

    Get PDF
    Position paper for the ACM SIGGRAPH/Eurographics Computer Graphics Education Workshop 200

    Model-Based Reinforcement Learning with Continuous States and Actions

    No full text
    Finding an optimal policy in a reinforcement learning (RL) framework with continuous state and action spaces is challenging. Approximate solutions are often inevitable. GPDP is an approximate dynamic programming algorithm based on Gaussian process (GP) models for the value functions. In this paper, we extend GPDP to the case of unknown transition dynamics. After building a GP model for the transition dynamics, we apply GPDP to this model and determine a continuous-valued policy in the entire state space. We apply the resulting controller to the underpowered pendulum swing up. Moreover, we compare our results on this RL task to a nearly optimal discrete DP solution in a fully known environment

    Approximate Dynamic Programming with Gaussian Processes

    Get PDF
    In general, it is difficult to determine an optimal closed-loop policy in nonlinear control problems with continuous-valued state and control domains. Hence, approximations are often inevitable. The standard method of discretizing states and controls suffers from the curse of dimensionality and strongly depends on the chosen temporal sampling rate. In this paper, we introduce Gaussian process dynamic programming (GPDP) and determine an approximate globally optimal closed-loop policy. In GPDP, value functions in the Bellman recursion of the dynamic programming algorithm are modeled using Gaussian processes. GPDP returns an optimal statefeedback for a finite set of states. Based on these outcomes, we learn a possibly discontinuous closed-loop policy on the entire state space by switching between two independently trained Gaussian processes. A binary classifier selects one Gaussian process to predict the optimal control signal. We show that GPDP is able to yield an almost optimal solution to an LQ problem using few sample points. Moreover, we successfully apply GPDP to the underpowered pendulum swing up, a complex nonlinear control problem

    PIPPS: Flexible model-based policy search robust to the curse of chaos

    Get PDF
    Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based reinforcement learning imply that the problem is not just a numerical issue, but it may be caused by a fundamental chaos-like nature of long chains of nonlinear computations. Not only do the magnitudes of the gradients become large, the direction of the gradients becomes essentially random. We show that reparameterization gradients suffer from the problem, while likelihood ratio gradients are robust. Using our insights, we develop a model-based policy search framework, Probabilistic Inference for Particle-Based Policy Search (PIPPS), which is easily extensible, and allows for almost arbitrary models and policies, while simultaneously matching the performance of previous data-efficient learning algorithms. Finally, we invent the total propagation algorithm, which efficiently computes a union over all pathwise derivative depths during a single backwards pass, automatically giving greater weight to estimators with lower variance, sometimes improving over reparameterization gradients by 10^6 times

    Manifold Gaussian Processes for regression

    Get PDF
    Off-the-shelf Gaussian Process (GP) covariance functions encode smoothness assumptions on the structure of the function to be modeled. To model complex and nondifferentiable functions, these smoothness assumptions are often too restrictive. One way to alleviate this limitation is to find a different representation of the data by introducing a feature space. This feature space is often learned in an unsupervised way, which might lead to data representations that are not useful for the overall regression task. In this paper, we propose Manifold Gaussian Processes, a novel supervised method that jointly learns a transformation of the data into a feature space and a GP regression from the feature space to observed space. The Manifold GP is a full GP and allows to learn data representations, which are useful for the overall regression task. As a proof-of-concept, we evaluate our approach on complex non-smooth functions where standard GPs perform poorly, such as step functions and robotics tasks with contacts.The research leading to these results has received funding from the European Council under grant agreement #600716 (CoDyCo - FP7/2007–2013). M. P. Deisenroth was supported by a Google Faculty Research Award.This is the accepted manuscript. It is currently embargoed pending publication

    Initial fixation placement in face images is driven by top-down guidance

    Get PDF
    The eyes are often inspected first and for longer period during face exploration. To examine whether this saliency of the eye region at the early stage of face inspection is attributed to its local structure properties or to the knowledge of its essence in facial communication, in this study we investigated the pattern of eye movements produced by rhesus monkeys (Macaca mulatta) as they free viewed images of monkey faces. Eye positions were recorded accurately using implanted eye coils, while images of original faces, faces with scrambled eyes, and scrambled faces except for the eyes were presented on a computer screen. The eye region in the scrambled faces attracted the same proportion of viewing time and fixations as it did in the original faces, even the scrambled eyes attracted substantial proportion of viewing time and fixations. Furthermore, the monkeys often made the first saccade towards to the location of the eyes regardless of image content. Our results suggest that the initial fixation placement in faces is driven predominantly by ‘top-down’ or internal factors, such as the prior knowledge of the location of “eyes” within the context of a face

    Impulsive Multivariate Interference Models for IoT Networks

    Get PDF
    Device density in wireless internet of things (IoT) networks is now rapidly increasing and is expected to continue in the coming years. As a consequence, interference is a crucial limiting factor on network performance. This is true for all protocols operating on ISM bands (such as SigFox and LoRa) and licensed bands (such as NB-IoT). In this paper, with the aim of improving system design, we study the statistics of the interference due to devices in IoT networks; particularly those exploiting NB-IoT. Existing theoretical and experimental works have suggested that interference on each subband is well-modeled by impulsive noise, such as α-stable noise. If these devices operate on multiple partially overlapping resource blocks-which is an option standardized in NB-IoT-complex statistical dependence between interference on each subband is introduced. To characterize the multivariate statistics of interference on multiple subbands, we develop a new model based on copula theory and demonstrate that it effectively captures both the marginal α-stable model and the dependence structure induced by overlapping resource blocks. We also develop a low complexity estimation procedure tailored to our interference model, which means that the copula model can often be expressed in terms of standard network parameters without significant delays for calibration. We then apply our interference model in order to optimize receiver design, which provides a tractable means of outperforming existing methods for a wide range of network parameters
    corecore